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1.   Introduction 

N  N 

Let   R  be   a  subset  of  IR     and  let  F:      R  -*■  E     be   a  nonlinear   function. 

* 

We  wish  to  consider  computing  a  solution  x  of  the  system  of  nonlinear 

equations 

F(x)   =   0,  (1.1) 

vhere   the  Jacobian  matrix 

F'(.x)=f  (x) 

is   sparse,    symmetric,   and  positive   definite.      Although  our  results    apply  in 
general,  nonlinear  systems   arising   in  the  numerical   solution  of  partial 
differential   equations   are  of  particular   interest. 

A  very  important  method  for  the   solution  of   (l.l)    is  Newton's 

method   (cf.    Ortega  and  Rheinboldt    [ll]).      For  a  given   initial   guess   x        , 

-   .+        +  (1)         (2)         (3) 

we   generate   a  sequence  of  iterates  x        ,   x        ,   x        ,    ...    by 

j^)    .    ;W    .   jW,  (1.2) 

(k) 

where   6  is  the   solution  to   the   sparse  linear  system 

F'(x(k))    6(k)   =   F(x(k)).  (1.3) 

Under  suitable   assumptions   on  x  and  F,    the   sequence   is  well    defined  and 

converges   quadratically  to  x*,    i.e. , 

m      «r  i  ,     .,     .     II    (k+1)  *i  i  2k,  ,    (k)  »nt     tt 

3c   <  1   such  that    [|x  -  x  £   c      ||x  -x||. 

A  straightforward  implementation  of  Newton's  method  could  use 

sparse   symmetric  Gaussian  elimination   (cf.    Eisenstat,   Schultz,    and  Sherman 


t  Throughout  this  paper  we  use      | • |      to   denote   the  Euclidean  norm. 

++   In  the  notation   of  Ortega  and  Rheinboldt    [ll]  ,   this    condition   defines 
R-quadratic   convergence.      This   is   slightly  weaker  than   the  more   standard 
definition  of  Q-quadratic   convergence,   i.e., 

3c'    <  1   such  that        x  -  x  ^   c'      x  -  x 


2 
[8],   Sherman    [13])   to   solve   the  linear  systems  which  arise.      Unfortunately , 

this  would  "be   inefficient  computationally  because   at  each  step   it  would 

(k)  (k) 

require  the   computation  of  F(x        )    and  F'(x        )    and  the  exact  solution  of 

(1.3).      Also,   a  large   amount  of  storage  would  be   required  for  the 

(k) 
factorization  of  F' (x        ). 

A  number  of  modifications   have  been  proposed  to  reduce   some  of  the 
costs   of  Newton's  method,   notably  the   simplified  Newton's   or  chord  method 
and  a  variety  of  Newton-iterative  methods    (cf.    Ortega  and  Rheinboldt    [ll]). 
For   certain  problems,  these  methods  may  require  less   storage  or  work  than 
Newton's  method.      However,    computational   difficulties   are  typical   in  the 
solution  of  nonlinear  systems,   and  each  of  these  methods   has   its    disadvan- 
tages.     The   choice  of  method  for  a  specific  nonlinear  system  (l.l)    depends 
quite  heavily  on  the  properties  of  F  and  on  the   storage   and  time    constraints 
imposed  by  the   computational  environment. 

The   simplified  Newton's  method  replaces   F' (x        )   with  F'(x        )    in 
(l.3).      This    avoids   the    computation  of  F' (x        )    at   each  step   and  reduces 

the   cost  of  solving    (1.3)    for  k   5  1,   since,    if  F'(x        )    is    factored  into 

T 
the  product  U  DU  at  the   first   step,    each  succeeding  step  only  requires 

forward  and  backsolving  three  triangular  systems.      However,  the    chord 

t 
method  converges  only  linearly     and  requires   the  same   amount  of  storage 

as  Newton's  method. 

The  Newton— iterative  methods   use   a  linear  iterative  method    (such  as 

SOR,    cf.    Ortega  and  Rheinboldt    [ll],   p.    215)    to   approximate   the   solution  of 

(1.3)    instead  of  solving   it  exactly.      The   sequence  of  iterates   generated  by 


(k)  * 

t     The   iterates  x  converge   linearly  to  x     if 

u  4.x.        11    Ck+1)  *,  1  I,    (k)  *, 

3c   <  1   such  that        x  -  x  £  c     x         -  x 


3 
such  methods  depends  upon  the  particular  iterative  method  chosen  and  the 
criteria  used  to  stop  the  inner  iteration.   The  advantages  of  Newton- 
iterative  methods  are  that  they  require  only  a  small  amount  of  storage  and 

(k)  * 

that,  when  x  is  near  x    ,  they   can  take   advantage  of  the   fact   that   zero 

(k) 
is   a  good  initial   approximation  to   6     "   .      The  problem  with  Newton-iterative 

methods   is   that   their  rate  of   convergence   is   essentially   determined  "by  the 

rate  of   convergence  of  the   inner   (linear)    iterative  method  applied  to   the 

linearized  problem   (1.3).      Thus   they  encounter  all  of  the   usual   difficulties 

with  linear  iterative  methods   relating  to  parameter  estimation,    starting 

guesses,    and  stopping   criteria. 

In  this  paper  we  present  the  Newton-Richardson  methods    (cf. 
Eisenstat,   Schultz,   and  Sherman    [7],   Sherman   [13]),   a  class   of  Newton- 
iterative  methods  which  use   a  Richardson-D' Jakonov  iteration    (cf.    Richardson 
[12],   D'Jakonov   [h])    to   approximate   the   solution  of   (1.3)    at  each  step. 
The  Newton-Richardson  methods   require   the   same   amount  of  storage   as  Newton's 
method  or  the   chord  method,  but   if  2     steps   of  the   inner  iteration   are   taken 
at  the  k-th  step,   they  will   converge   quadratically  with  less  work  per  step 
(on   average)   than  with  Newton's  method.      And  as    compared  to  most  other 
Newton-iterative  methods,   the  Newton-Richardson  methods  have   the   advantage 
that  their  rate  of  convergence  is   essentially  independent  of  N. 

In  the  next   section  we   describe   a  model   semilinear  partial 
differential   equation  which  will  be  used  to  motivate   the  Newton-Richardson 
methods.      In  Section   3  we   discuss   the   implementation  of  Newton's  method  and 
show  how  to    derive  Newton-Richardson  methods    for  the  model   problem.      In 
Section  U,   we   examine  the  general   theory  of  Newton-iterative  methods   and 
state   several  local  convergence   results.      In  Section  5  we    consider  the 


k 

ramifications   of  the   theory   for  the  model  problem,  and  finally,    in  the 
appendix  we  prove   the   results   stated  in  Section  h. 


2.      A  Model   Problem 

In  this   section  we   describe   a  model  problem  arising  in   the  numerical 
solution  of  semilinear  partial   differential   equations    and  show  how  to   apply 
Newton's  method  to   it.      In   the  next  section,  we  will  motivate   the  Newton- 
Richardson  methods  by   analyzing  the   costs   of  Newton's  method  and  certain  of 
its  modifications    for  this  model   problem. 

Let  D  be   the  unit  square    (0,l)x(0,l)    and  let    3D  denote   the  boundary 

of  D.      The  model  problem  we  wish  to   solve   is  the  nonlinear  Poisson  equation 

-A  v  =   g(v)  in  D 

(2.1) 
with  v  =   0  on   3D, 

where  g  satisfies 

gv(v)    £   X    <   A 

(where   A  is   the   fundamental  eigenvalue   of  -A  on  D)  ,   so   that   v(x,y)    exists 

and  is   unique    (cf.   Ortega  and  Rheinboldt    [ll],   pp.    110,   ll6). 

To  obtain   an  approximate   solution  to    (2.1)  ,   the    continuous  problem 

must  be   reduced  to   a   discrete   form.      Several   techniques    are   available  to   do 

this,   and  we   consider  the  use  of  five-point   finite   difference   approximations. 

We   superimpose  on  D  a  uniform  square  n*n  mesh  D    ,  with  boundary   3D    ,   and 

replace   the    differential  operator  -A   in    (2.1)   with  the   standard  five-point 

difference   approximation  on  D,  .      (Here  h  =   — r*  is   the   vertical  or  horizontal 
^  h  n+1 

distance  between   adjacent  mesh  points.)      Letting  V. .  be   the   approximation 
to   v(ih,    jh)  ,  we  obtain  the   system  of  equations: 

-V.   .     .    -   V.  ■       .    -  V.     .   _    -  V._,_-     ,  +  UV.  ,   =  h2g(V.  .) 
l-l, j  i,J-l  i,,]+l  i+l,  J  ij  ij 

for   (ih,    jh)    e  Dh  (2.2) 

V       =   0  for  (ih,   jh)    e   3Dh. 

2 

Letting  N  =  n      and  ordering  the  mesh  points  with  the  natural,   or  row-by-row 


(2.3) 


ordering,  we  may  write  these  equations  more  concisely  as  an  N*N  system  of 
nonlinear  equations 

AV  =  G(V)  , 
in  which 

V  =  {Vij:  (ih,  jh)  e  Dh}, 


G(V)  =  {h  g(V   ):  (ih,  jh)  e  ly, 


and  A  is  the  n*n  block  tridiagonal  matrix 


A  = 


-I   XB 

where  I   is   the  nxn  identity  matrix  and  B  is   the  n><n   tridiagonal  matrix 

*U     -1> 


B  = 

-1 
N-l       H 

Under  the   conditions   imposed  on    (2.1),   this   system  of  equations  has   a 
unique   solution  V  which  is    a  close   approximation  to   v(ih,    jh)    (cf.    Ortega 
and  Rheinboldt    [ll],   p.    112). 

Newton's  method  may  be   applied  in   a  straightforward  way  to   solve 
(2.3)    for  V.      Given   an  initial   guess   V        ,  we   generate   a  sequence  of 
successive   approximations   V        ,    V        ,   V        ,    ...    to   V  by 


v(k+l)  _  y(k)  _  6(k) 


(k)  . 


where  6    is  the  solution  of  the  system  of  linear  equations 
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j(y(k))    6(k)   =  AV(k)    -  G(V(k))  (2.1+) 

and 

j(y(k))  =  a  -  ~  (v(k)). 

(k) 
Under  the   assumptions  made   above,   J(V        )    is    an  N*N  sparse,    symmetric, 

positive   definite  matrix  having  the   same    zero   structure   as  A   (i.e.  , 

(j(V(k))).  .   4  0   if  and  only  if  A.  .   i   0).      If  V^    is   sufficiently  close   to 

V,   then  Newton's  method  converges   quadratically  to   the   solution  of   (2.3) 

independent  of  h    (cf.    Greenspan   and  Parter   [10]),    and  only  0(log  log  N) 

Newton  iterations    are   required  to   reduce  the   initial   error    |  [ V         -  v|  | 

2        1 
hy   a  factor  of  h     =  — ,  the   discretization  error. 


t     Throughout   this  paper  we   use  log(x)    to   denote   log_(x) 


8 
3.      Newton-Richardson  Methods   for  the   Model  Problem 

As  we   saw  in  the  last   section,   each  step  of  Newton's  method  for  the 
model  problem  requires   the   solution  of  a  sparse   symmetric  linear   system. 
Thus  Newton's  method  really   amounts   to   the   solution  of  a  sequence  of  related 
linear  systems,   and  its   costs    are   directly  related  to   the  manner  in  which 
these   systems    are   solved.      In  this    section  we   consider  a  number  of  approaches 
to   solving  the   systems    (2.U),   and  we   introduce  Newton-Richardson  methods 
for  the  model  problem   (cf.    Eisenstat,   Schultz ,   and  Sherman    [7],   Sherman    [13]). 

In  the   standard  implementation  of  Newton's  method  for     the  model 

problem,    the   linear  systems  which  arise   can  be   solved  with  sparse   symmetric 

Gaussian  elimination    (cf.    Eisenstat,   Schultz,   and  Sherman    [7,    8],    Sherman 

[13]).      Each  solution  requires   0(N  log  N)    storage  locations   for  the   factors 
r(kh      rt,„3/2,    ..„.__., ,,.„_  „__      .  T„T(k) 


o 


f  J(VV    ;),   0(N        )    arithmetic  operations   to    factor  J(V      ;),    and  0(N  log  N) 


arithmetic  operations   to   solve  the   resulting  triangular  systems    (cf.    Sherman 
[13]).      For  V  sufficiently   close   to   V,   Newton's  method  converges   quad- 

ratically,   and  the   initial  error    |  | V  -  v|      can  be   reduced  by   a  factor  of 

2        1 
h     =  —  in  O(log  log  N)    steps    at   a  total   cost  of  0(N   log  N)    storage   locations 

3/2 
and  0(N        log  log  N)    arithmetic  operations. 

One   of  the   reasons   that  Newton's  method  requires   so  many   arithmetic 

(k) 

operations  is  that  it  does  not  exploit  the  fact  that  the  iterates  V    are 

converging  quadratically  to  V.   The  sequence  of  systems  (2.H)  is  solved  as 

if  the  systems  were  unrelated,  when,  in  fact,  their  solutions  rapidly 

(k) 

approach   zero  as   V  converges   to  V.      To  exploit   this   situation,  we   could 

use   an  iterative  method  to   approximate   the   solution  of   (2.U)    at  each  step. 
Most  Newton-iterative  methods   require  only  0(N)    storage  locations    for  the 
model  problem,   and  some   require   fewer  arithmetic  operations   than  Newton's 
method.      Unfortunately,  however,    for  the  model  problem  the   rates   of 


9 
convergence  of  most  Newton-iterative  methods    depend  on  the  mesh  spacing  h 
(see  Section   5)    in  such   a  way  that   they  have  higher  asymptotic   arithmetic 
costs   than  Newton's  method.      For  example,   even   though  Newton-SOR  converges 

linearly,   the   rate   of  convergence   depends   on  h,   and  the  method  requires 

1/2  2 

0(N        log  N)    iterations   to   reduce  the   error  by   a  factor  of  h      (see  Section 

5).      If  the    rate   of  convergence  were   independent  of  h,   then  only  O(log  N) 

iterations  would  be  required. 

We  now  present  the  Newton-Richardson  methods   for  the  model  problem. 
These  methods   require  more   storage   than  most  other  Newton-iterative  methods 
(0(N   log  N)    locations),  but  they  converge   at   rates  which  are   independent  of 
h.      The  methods    are  based  on  the  principles   embodied  in   the  Strongly 
Implicit  Procedures    developed  by  Stone    [lU],    Dupont    [6],   Diamond   [3],   and 
others.      The  basic   idea  is   to   use   sparse   Gaussian  elimination  to  obtain 
6  at  the   first  step  and  to   compute   6  at   succeeding   steps  with  a 

Richardson-D' Jakonov  iteration    (cf.    Richardson    [12]  ,   D' Jakonov   [k]) 
involving  the  factorization  of  J(V        ). 

In  particular,   suppose   that   at  the  k+l-st   step    (k   >,  l)   we  wish  to 
solve   the   system 

j(V(k))    6(k)   =  b(k).  (3.1) 

Since  we  have   already  factored  J(V        )    at   the   first  step,    the   system 

J(V        )   w  =    d 
can  be   solved  with  0(N   log  N)    arithmetic   operations.      We   can  then  solve 
(3.1)   by  using   a  Richardson-D 'Jakonov  iteration  to   generate   a  sequence  of 
iterates   6         ,8         ,    8         ,    ...    which   converges   to   6        .      (The   idea  here   is 

t      It   is    also  possible   to  use   a  scaled  conjugate  gradient   iteration  here 
(cf.    Douglas    and  Dupont    [5])»   but   the   analysis   is  more   complicated,   and, 
in   any   case,   the   results    are  improved  by   at  most   a  constant   factor. 


10 
similar  to   that   of  iterative   improvement,    cf.    Forsythe   and  Moler    [9]» 
pp.    U9-5M  •      For  6  =0  and  a   fixed  real    constant  y,   the   Richardson- 


D'Jakonov  iterates   satisfy 


Als.)  ..   .(k)  (k) 

6  =6.        -  ye. 

~i+l        ~i  ~i 


(k) 
where  e.    is  the  solution  to  the  linear  system 
~i 


J(V(0))  efk)  =  j(V(k))  6<k)  -b(k). 


~i       ~     ~i 

Each  step  of  the  iteration  requires  0(N  log  N)  arithmetic  operations,  and 

(k) 
the  iterates  will  converge  linearly  to  6    if 

p  =  ||I-  yJtV^rW10)!!  <  1 
(cf.  Young  [15],  p.  77). 

To  maximize   the   rate  of  convergence,  we   choose   y   to  minimize  p.      If 
all  the   eigenvalues  of  J(V        )      j(V        )    lie   in   the   interval   [a,g]  ,   this   is 
accomplished  by   choosing 

2 


y  = 


a+3    ' 
yielding 

p   =    (g   -  a)/(a  +    3). 
For  the  model  problem,   p    can  be  made   independent  of  h  by   choosing  V  near 

enough  to  V   (see   Section   5),   so  the   Richardson-D' Jakonov  iteration  will 

t 
converge  linearly,   independent  of  the  mesh  size. 

Different  Newton-Richardson  methods    are  obtained  by  varying  the 

number  of  Richardson-D' Jakonov  iterates    computed  at  each   step  of  the  outer 

k 
iteration.      Of  particular  interest    are   the   case   in  which  2      iterates   are 


t      It  has  been   shown  by  several   authors    (.cf.    Young   [15])    that  the   rate  of 
convergence   is   increased  by  a  constant  factor  by  using  Tchebychev 
acceleration  to   choose   a  sequence  of  parameters    {yvl   instead  of  the 
single  parameter  y. 
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computed  at  the  k-th  step  and  the  case  in  which  just  one  iterate  is  computed 
at  each  step.   In  Section  5  we  will  show  that  if  V    is  sufficiently  close 
to  V,  then  the  first  of  these  methods  converges  quadrat ically  to  V,  while 
the  second,  which  reduces  to  a  relaxed  chord  method,  converges  linearly. 
However,  in  either  case  the  solution  of  the  model  problem  requires  a  total 
of  0(N(log  N)  )  arithmetic  operations  in  addition  to  the  cost  of  the  initial 
factorization  of  J(V   )  (which  we  treat  as  preprocessing). 


12 
h.      Convergence  Results 

In  this   section  we  present    an   analysis   of  the  local   convergence 
properties   of  Newton-iterative  methods.      We  will   show  that   it   is   possible 
to   adjust  the   rate  of  convergence  of   such  methods  by  varying  the  number  of 
inner  iterations   taken   at   each  step  and  that  by  taking  2      inner  iterations 
at   the  k-th   step,   quadratic   convergence   is   obtained.      Since  the  Newton- 
Richardson  methods    described  in  the  last   section   are   a  particular  instance 
of  Newton-iterative  methods,   the   convergence   results    for  them  follow 
immediately.      The  proofs  of  the   results    described  here   are    contained  in 
the   appendix. 

We  begin  by  restating  our  problem  in   a  more  precise   and  formal  way. 

N  N 

If  R   is    a  bounded  subset  of  R     and  F:      R  -»■  R     is   a  nonlinear   function,  we 


consider  the   solution   of 


F(x)  =  0,  (U.l) 


* 

where  we  assume  that  a  solution  x  exists.   We  also  assume  that  there  is  an 

ii      *i  i 
r  >  0  such  that  if  S  =  {x:   | |x  -  x  |   <  r  },  then 

(i)    F  is  differentiable  in  S  ; 

3F  * 

(ii)    F'(x)  =  —  (x)  is  nonsingular  at  x  ; 

~  oX        ~ 

(iii)      There  is    an  L   <  +  °°  such  that   for  x  e   S_  , 

|  |F'(x)   -  F'(x*)|  |    <:  L|  |x  -  x*|  |. 

It  can  be  shown  that  the  discretized  model  problem  of  Section  2 

satisfies  (i)  -  (iii)  (cf.  Greenspan  and  Parter  [10]),  so  that  we  may 

directly  apply  the  theory  developed  here.   In  the  next  section  we  will 

consider  the  model  problem  as  a  practical  illustration  of  the  theory. 

From  assumptions  (i)  -  (iii),  it  follows  that  there  is  a  0  <  r_  £  I'- 
ll    *i  i 

such  that  F'  is  continuous  and  nonsingular  in  S]  =  {x:    |x  -  x  |   <  r..  } 
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(cf.    Ortega  and  Rheinboldt    [ll],   p.    U6).      Thus  Newton's  method  can  be   used 

*  (0) 

to   solve    (U.l)    for  x   .      For  a  given  x  e  S.  ,  we  generate  the  sequence  of 

iterates   x        ,  x        ,  x        ,    ...   "by 

5(»D..  ,(*)■■_  jOO,  (u.2, 

where 

F'(x(k))    6(k)   =   F(x(k)).  (k.3) 

It    can  he   shown  that   for  some  0   <   r     £  r    ,   Newton's  method  converges 

*  (0)  ii  *, | 

quadratically   to  x     for  x  e  S     =    {x:         |x  -  x    | |    <   r_}    (cf.    Ortega  and 

Rheinboldt    [ll],  p.    312). 

Newton-Richardson  methods   are  a  specific  instance  of  general  Newton- 
iterative  methods   in  which   a  linear  iterative  method  is   used  to   approximate 
the   solution  of   (k.3)   at  each  step.      A  common  way  of  generating  Newton- 
iterative  methods   involves   splitting  F' (x)   by  writing  it   as 

F'(x)  =  B(x)   -   C(x),  (k.k) 

where 

(iv)   B(x)  is  continuous  at  x  and 

(v)   B(x  )  is  nonsingular. 
Furthermore,  if  the  splitting  (U.U)  is  to  lead  to  a  computationally  efficient 
method,  it  must  be  possible  to  quickly  solve  the  system 

B(x)  w  =  d.  (k.5) 

It   follows    from   (iv)   -    (v)    that   there   is   a  0   <   r_   £  rp  such  that  B   is 

*i 

continuous   and  nonsingular  in  S     =    {x:         |x  -  x    |       <  r   }    (cf.   Ortega   and 

Rheinboldt    [ll],   p.    k6)\    and  if  we  let 

H(x)   =  B(x)"1  C(x) 

=   I  -  B(x)"1  F(x), 
it   then   follows   that  H  is   continuous   in  S_. 


Ik 

The  Newt on- Richards on  methods   are  obtained  with  the   splitting   (U.U) 
in  which  B   and  C  are    defined  by 

B(x)   =  -  F'(x(0)),  C(x)   =  B(x)   -  F'(x), 

where  y   is    a  real  constant  satisfying  0   <  y    <   2  and  x  e  S_  .      Clearly 

this   splitting  satisfies    (iv)    and   (v)  ,   and  once   F'  (x        )   has  been   factored 
at  the    first   step,   the  system  (h.3)    can  be  solved  quickly. 

At  each  step  of  a  Newton-iterative  method,   we   use   the  linear 
iteration  to   generate   a  sequence  of  inner  iterates    6         ,6         ,   6^      ,    ...    in 
which  SXJ   =   0   and 

.00  _  .00        (k) 

6.  .-     -    0.  -    E .  , 

-1+1       ~1  -i 

where 

B(x(k))    5<k)   =   F'(x(k))    «<k)   -F(x(k)). 

For  any  positive  integer  m,  let 

A  (x)  =  (I  +  H(x)  +  H(x)2  +  ...  +  H(x)111"1)  B(x)-1 

m  ~  ~  ~ 

=  (I  -  H(x)m)(l  -  H(x))'1  B(x)"1 
=  (I  -  H(x)m)  F'(x)-1. 


Then  letting 


and 


we  have 


Bk=   B(x(k)) 


Hk  =  H(x(k)), 
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|W   =   8<k!    -B-V(xW)    «!k)    -F(x(k))) 

~i+l        ~i  k  -  ~i 

=  IL    6<k)  ♦  BT1  r(»W) 

k   ~i  k 

=   H^   6^  +  B"1   F(x(k))]    +  B"1  F(x(k)) 

=  Hk^+(l  +  Hk)Bk_1^(k)) 

-  H^+1    &{0k)   +    (I  +   Hk  +    ...    +  h£)    B"1   F(x(k)) 

=    (I  +  Hk  +    ...    +  h£)    B"1   F(x(k)) 

=   Ai+1(x(k))FU(k)). 

At  the  k-th  Newton-iterative  step,  we   define 

6(k)   _   6(k) 

for  some  positive  integer  m.   Letting 

G  (x)  =  x  -  A  (x)  F(x) , 
m  -    ~    m  ~    - 

we  have 

-o  (x(h)). 
\  - 

If    | |h(x   )|      <  1,   then  under  assumptions    (i)   -   (iii)    it   can  he   shown  that 

for  some  0    <  r,    s  r    ,   the  outer  iterates    {x        }    defined  hy    Qi.6)    exist    and 

*  (0)  II  * i i 

converge  to  x     for  x  e  S,    =    {x:       [  |x  -  x  <  r,  }    (cf.   Ortega  and 

Fheinboldt    [11  ],   pp.    350-351 ).+ 

t      It   actually  suffices   that  the   spectral   radius   of  HCx   )   satisfy 

p(H(x*))    <  1,  hut   in  order  to   avoid  problems   in   relating  norms  to  the 
spectral   radius  ,  we  give   all  our  results   in  terms  of  the  Euclidean  norm, 
(if  H(x*)    is   symmetric,   then   of  course  we  have 

I |H(x*)||    =   p(h(x*)).) 
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The   rate   at  which  the  Newton-iterative   iterates   converge   to  x 
depends  mainly  on    |  |H(x   )|      and  the   sequence    {m,  }.      To  obtain   our  main 

convergence   result,  we  make   two   further  assumptions   on  the   splitting    (U.U): 

ii*ii 
(vi)        For  some   constant   A,   0    <   A    <  1,    |[H(x)|      <   A; 

(vii)      There   are  an  L      <  +  °°   and  a  0   <   r     £   r,    such  that   for 

ii  *i  i 

xeS=    {x:       |  |  x  -  x    ||    <   r   }  , 


|B(x)   -  B(x    )| |    *  L1| 


*l 

X    -    X     ' 


We   now  state  our  main   convergence   result,    deferring   its  proof  to 
the   appendix. 

Theorem  k.l:      Let  F  satisfy    (i)    -    (iii)    and  let  F' (x)   =  B(x)    -   C(x)  be   a 

splitting  which  satisfies    (iv)    -    (vii).      Let  m    ,  hl  ,  m    ,    ...   be   a  sequence 

of  positive  integers,    and  define 

k-1 
m  =  max   [U,m   }   u    {(m,    -      I     m   ):      k  =  1,2,3,...}]. 
U  *        1=0     l 

Then,   if  m   <  +  °°,   there   are   a  0    <   r   <;   r      (depending   on   X,  H,   and  F)    and  a 

positive   constant   c    (depending   on   A   and  m)   with   X    <   c   <  1   such  that   for 

(0)  *i  i 

x  e   S  =    {x:         |x  -  x    ||    <   r},    the   sequence  of  iterates    defined  by    (h.6) 

satisfies 

I  |    (k+1)  »j.  "k    ,,    00  *,  I 

||x  -x||£c  |x  -   x    I  I . 

While  the   restriction  on  m  in  Theorem  U.l  may  seem  contrived,   it 
simply  reflects   the   fact   that  Newton-iterative  methods   cannot  be   expected 
to   exhibit   super-quadratic   convergence.      Hence   choosing  ni     >   2     is   pointless, 
If  we   choose  m     =   1  or  iil    =   2    ,   then  m  =  1    in  the  theorem,    and  we  have   the 
following  corollaries. 


IT 
Corollary  U.2:      Let  F  satisfy    (i)   -    (iii)    and  let   F1  (x)   =  B(x)    -   C(x)   be   a 
splitting  which  satisfies    (iv)   -    (vii).      Define  the   sequence 
{m.:      k  =   0,1,2,...}  by  nL    =  1.      Then  there   are   a  0   <   r   <:   r      (depending  on 

X,  H,   and  F)    and  a  positive   constant   c    (depending   on  X)   with  X    <   c   <   1   such 

(0)  ii  *, | 

that   for  x  e   S  =    {x:       |  |x  -  x    |       <   r},   the   sequence  of  iterates    defined 

by    (h.6)    converges   linearly,   i.e., 

I  I    (k+1)  *i i  ii    (k)  *, | 

[ |x  -  x   I |    £   c| [x         -  x   | | . 

Corollary   U.3:      Let   F  satisfy   (i)   -    (iii)    and  let  F' (x)   =  B(x)   -   C(x)   be   a 

splitting  which  satisfies    (iv)    -    (vii).      Define   the   sequence 

{hl  :      k  =   0,1,2,.'. .  }  by  m.    =  2    .      Then   there   are   a  0   <   r  £   r      (depending  on 

X,   H,   and  F)    and  a  positive   constant  c    (depending  on  X)  with   X   <   c   <  1   such 

(0)  ii  *, | 

that   for  x  e   S  =    {x:       | |x  -  x   |      <   r},   the   sequence  of  iterates   defined 

by   (h.6)    converges   quadratically ,    i.e., 

II    (k+1)  *,,  2kM    (k)  *,, 

X  -    X  £     C  X  -    X  . 
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5.      Ramifications  of  the  Theory   for  the  Model  Problem 

Corollary"  U.2   expresses   the  previously  known   result   that   a  Newton- 
iterative  method  converges   linearly   for  m     =   1   (cf.    Ortega  and  Rheinboldt 
[ll],  p.    350).      The   real   import  of  Theorem  k.l  is   the   rather  surprising 
result  of  Corollary   U.  3,    i.e.,   that  we   can   obtain  R-quadratic   convergence 
with  a  Newton-iterative  method  by   choosing  hl    =   2    .      Since  Newton's  method 
itself  is   only  quadratically   convergent    (cf.    Ortega  and  Rheinboldt   [ll] , 
p.    312)    this    is   the    fastest   convergence   that    can  be  expected  for  any 
Newton-iterative  method. 

A  problem  with  the   result  of  Theorem  U.l   is   that   c   depends  on   X: 

specifically,   A    <   c   <  1.      To   reduce   the   initial  error      |v  -  v|  j    by   a 

factor  of  K   <  1   requires    I  outer  iterations  ,  where 

l-l     m 
K  *      tt      c      .  (5.1) 

k=0 

Taking  logarithms ,  we  obtain 

l-l 
log  K  ~  (   Z   m  )(log  c), 
k=0  K 


so   that   I   satisfies 

l-l 

I      til     *   log  K/log   c    ^   log  K/log   X. 
k=0     k 

As   an  illustration,  we    consider  the  model  problem  of  Section  2. 

For  Newton-SOR  methods   it   can  be   shown   that  even  with  the  optimal   SOR 

parameter  co,    log   X   ~    -2iTh    (cf.    Young   Il5],   p.    190).      Reducing  the   initial 

2 
error  by   a  factor  of  the   discretization  error  h     requires    I  iterations, 

where   £   satisfies 
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U1  2 

E      hi     ~   log  h   /log    X 

k=0     K 

8    (-log  N)/(-2irh) 


I  /p 
=   0(N  '    log  N). 


Thus   if  hl    =   2    ,  we  have 


^"1  0  -l  /p 

£     m     =   2     =   0(N   '    log  N)  , 

k=0 
so  that  &   =   0(log  N).      And  if  bl    =  1,   then  we  have 

4"1  1/2 

Z      hi     =   H  =    0(N   '    log  N). 

k=0     k 

In  either  case,   though,   since  each  inner  SOR  iteration   requires   0(N)   opera- 

3/2 
tions    (cf.    Young    [l5J)j   a  total   of  0(N        log  W)    iteration  time   is    required, 

3/2 

as   compared  to  the   0(N        log  log  N)   time   required  by  Newton's  method. 

For  Newton-Richardson  methods,   however,   the   situation  is   quite 
different.      We  have 

||h(x*)||  =   MbCx*)-1  C(x*)|| 


YF'Cx^^"1    [(   h  )    F'(x(0))    -  F'(x*)] 


Y 


i  -  yF'U^V1  f«(x*)||. 


Since   Fr (x)        is    continuous   in  S    ,  we  may  make    | |H(x    )|      arbitrarily   close 

i  i  i  i    (0)  *i  i 

to    |1   -  y|    by  choosing    | |x  -  x    | |    small   enough.      In  particular,  we   can 

always  make      |h(x    )|      <    (l  +    |l  -   y\)/2   <  1.      For  the  model  problem,   then, 

the   constants   X   and  c  of  Theorem  k.l  are  independent  of  h   for  Newton- 

2 
Richardson  methods.      Thus   to   reduce  the   initial  error  by   a  factor  of  h 

requires   I  outer  iterations,  where 

£-1  2 

I     m     ~   log  h   /log   c  =  -log  N/log  c 
k=0     K 


20 
and  log   c   is    a  constant   independent  of  N.      Hence   if  itl    =   2    ,  we  have 

2     n.    s   2     =   -log  N/log  c, 
k=0 

so   that   £  =   0(log  log  N).      And  if  m.    =   1,  then  we  have 

£-1 

I     m     =    I  ~    -log  N/log   c  =  0(log  N). 
k=0     k 

This   leads  to   the   following  result  on  the    cost  of  solving  the  model  problem 

with  Newton-Richardson  methods    (for  a  proof,    see  Eisenstat,   Schultz,   and 

Sherman    [T]»   Sherman    [13]). 


Theorem  5.1:      Consider  the   use   of  Newton-Richardson  methods   to   solve  the 
model   semilinear  problem  presented  in  Section  2,   and  assume  that   V  and 

Y   are   chosen  so   that  Theorem  k.l  holds    for  c   independent  of  h.      Then 
ignoring  the   costs   of  preprocessing   and  the    factorization   at   the    first 
step,  we  may  reduce   the   initial   error    | |V         -  V| ]    by   a  factor  of  h     in 
0(N(log  N)    )    iteration  time  by   choosing  either  itl    =   2     or  m,    =  1. 

From  Theorems   k.l   and  5.1  we    conclude   that  the    choice  of  m,     for 

k 

Newton-Richardson  methods   should  be  based  on  the   relative   costs  of  function 
and  derivative  evaluations.      For  the  model  problem,    choosing  itl    =   2 
requires   0(log  log  N)    function  evaluations    and  0(log  log  N)    derivative 
evaluations,  while   choosing  m     =   1  requires   0(log  N)    function  evaluations 
and  only  one   derivative   evaluation    (i.e.,   F' (V        )).      Since   function 
evaluations   and  derivative   evaluations    are   equally  costly  for  the  model 
problem  (   —  (V        )    is    a  diagonal  matrix)  ,   it  seems  best  to    choose  m,    =   2    . 
However,   we  have  observed  experimentally  that   for  other  problems   it  may 
be  more   efficient   computationally  to   choose  itl    =  1    (cf.    Sherman    [13]). 
The   reason    for  this   is   that   derivative   evaluations    are  often   far  more 
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costly  than   function  evaluations    for  problems  more  general   than  the  model 
problem. 
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6.      Conclusion 

In  this  paper  we  have   given   a  general   analysis   of  the   local   rates 
of  convergence  of  Newton-iterative  methods   for  the   solution  of  systems   of 
nonlinear  equations.      In   addition,  we  have  presented  the  Newton-Richardson 
methods,   a  particular  class   of  Newton-iterative  methods  which  appear  to 
have   computational   advantages   over  both  Newton's  method  and  other  Newton- 
iterative  methods    for  sparse  nonlinear  systems.      In  practice   these  methods 
have  performed  quite  well    (cf.    Eisenstat,  Schultz   and  Sherman    [7],   Sherman 
[13]),   although  they   are  by  no  means   good  methods    for  all  problems.      It   is 
clear  that  the   choice  of  a  method  for  a  particular  nonlinear  system  depends 
heavily  on  the   nonlinear  operator  involved  and  the    context   in  which  it 
arises . 

In   closing,    it  is  worth  pointing  out  that  the  ideas   underlying  the 
Newton-Richardson  methods  have   important   applications   in  many  other  settings. 
Already,   a  number  of  authors    (e.g.,   Bank    [l],    Concus    and  Golub    [2],   Douglas 
and  Dupont    [5])   have   used  SIP-like   iterations   to   solve  linear  systems    arising 
in  the  numerical   solution  of  partial    differential   equations,   and  in  the 
future  similar  techniques  will  no   doubt  be    applied  to   a  wide   variety  of 
nonlinear  and  time-dependent  problems.      In   fact,    it  would  seem  that   the 
idea  of  solving  difficult  problems  by  iterating  on  the   solutions  of  nearby, 
easily-solved  problems   is   one  which  should  prove   fruitful  whenever  it   can 
be   applied  in   a  natural  way. 
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Appendix:      Proofs  of  Convergence   Results 

In  this   appendix  we  shall  prove  Theorem  k.l.     We  will  make   frequent 
use   of  the   assumptions   and  results  of  Section  U,    and  the   reader  is   advised 
to   refer  to   that  section,  particularly   for  the   statements  of  assumptions 
(i)   -   (vii).      Much  of  our  analysis   follows   that  of  Ortega  and  Rheinboldt 
[11],  pp.    350-352. 

Lemma  A.l:      Let  F  satisfy   (i)    -    (iii)    and  let  F'(x)   =  B(x)   -   C(x)   he   a 
splitting  which  satisfies    (iv)    -   (vi).      Then   there   is    a  0   <  r^   $   r,     such 
that   for  x  e   S^-  =    {x:       |  |x  -  x    |       <   *>}, 

I |h(x)||  *  a 

and  for  any  positive   integer  m, 

||H(x)m-  H(x*)m||    v<  mA^llHCx)   -  H(x*)||. 
Proo f :      Since  H  is   continuous   in  Si  ,  we   can   choose   r^   £  r,    so  that 

| H(x)  I      £  A   for  x  e  S^-.      The   second  part  of  the  lemma  follows  by  induction. 
It  holds   trivially  for  m  =  1;   and  if  it  holds   for  m  =  k,  then 

|  |H(x)k+1  -  H(x*)k+1|  |   =    | |H(x)k   (H(x)    -  H(x*))   +    (H(x)k  -  H(x*)k)H(x*) |  | 

S   Xk||H(x)   -  H(x*)|  |    +  kA1""1!  |H(x)    -  H(x*)||     |  |h(x*)| 

$    (k+l)Ak||H(x)   -  H(x*)||.  D 

Lemma  A. 2:      Let  F  satisfy   (i)   -  (iii)    and  let  F1 (x)   =  B(x)    -   C(x)  he   a 
splitting  which  satisfies    (iv)   -    (vi).      Then  there   are  positive   constants 

c,    <  +  °°   and  c      <  +  °°   and  a  0   <  r     £   r^   such  that   for  any  positive  integer 

ii  *i  i 

m  and  xeS={x:       |  |x  -  x   |  |    <  r   }  , 

|G   (x)   -  x    I  I    <:   cJ  |x  -  x    I  I2  +   c0|  |H(x)   -  H(x   )|       I  |x  -  x   I      +   Am|  |x  -  x    I 

Proof:      From  (iv)   -    (vi)    and  Lemma  A.l,   it   follows   that  we  may  choose   r_ 


2k 
so   that  there   is    a  B    <  +  °°  such  that    for  x  e   S    ,    |  |h(x)  |      £   A   and 
|  |b(x)_1|  I    <:   6    (cf.   Ortega  and  Rheinboldt   [ll],  p.    351).      Then   for  x  e   S 
we  have 

I |G   (x)    -  x*| |    *    | |x  -  A   (x)    F(x)    -  X*  -  H(x*)m(x  -  x*) I  I    +    I |H(x*)m(x  -  x*) 

=    ||-Am(x)    F(x)   +    (I  -  H(x*)m)(x  -  x*)||    +    ||H(x*)m(x  -  x) \ \ 
=    ll-A   (x)    F(x)   +  A  (x*)   F'(x*)(x  -  x*) I I    +    I |H(x*)m(x  -  x*) I 


AA  y.  JJ£ 

S    |    -A   (x)[F(x)   -   F(x   )   -  F' (x    )(x  -   x   )]||    + 
||Am(x)[F'(x)    -  F'(x*)](x  -  x*)||    + 

| | [A   (x)    F'(x)    -  A   (x*)    F'(x*)](x  -  x*)||    +   Am||x  -  x*| 
m  ~  ~  ra~  -  ~        ~       '  ~        ~ 


Using 


I I A   (x)||    =    | | (I  +  H(x)   +  H(x)2  +    ...    +  H(x)m"1)B(x)' 


*    (1  +    A  +    A2  +    ...    +    Am_1)| 


*   1-A    » 
we   can  bound  the  terms  of    (A.l)    individually.      From  the  mean  value   theorem 
we  obtain 

ll-A   (x)[F(x)    -   F(x*)    -   F'(x    )(x  -   x   )]|| 


<      I A   (x) I         |x  -  x    |         sup        |F»(x     +  t(x  -  x    ))    -   F' (x    ) I 
m~  "        "  _    ,    _  ~  ~~  ~ 

QStSl 


$      |A   (x) I         |x  -  x    I         sup     L      |t(x  -  x   ) 
m  ~  0$t*l 

*   6L/(1   -   A)    i  |x  -  x    I |2. 


The   second  terra  of   (A.l)    is   bounded  by 
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I |Am(x)[F'(x)   -  F«(x*)](x  -  x*)||    $    llAm(x)| |     ||F'(x)   -  F'(x*)| |     ||x  -  x*| | 

$   PL/(1   -   X)    | |x  -  x    | |2. 

Lemma  A.l  now  yields 

| | [ A   (x)    F« (x)   -  A   (x*)    F' (x*) ] (x  -  x*) | |    =    | | (H(x*)m  -  H(x)m) (x  -  x*j | | 

I  '      m  ~  ~  m~  ~  *•        «•  ~  ~  ~        ~         ' 

*  mX*1"1    |  |H(x)   -  H(x*)|  |    ||x  -  x*||. 
Since   X    <  1 ,  we   can  choose   c     =   c    (X)    <  +  °°  so   that  the   set    {mX        }   is 
bounded  by   c    .      Letting  c     =    c   (X,   r    ,   8,   L)   =   2BL/(l   -   X)    and  combining 
terms,  we  obtain 

ii  *  i    i  ii  ^  I    i  ?  11/.  /^>iiii  *  i    i  TTlii  *  i    i 

G   (x)    -  x  S  a.      x  -  x      r  +   cj    H(x)   -  H(x  x  -   x  +   X        x  -  x        .    □ 

II  m  -  -    "  1 '    ~        ~  2.  -  ~      i>i'r        -11  M-        ~    '  ' 

Assumption    (vii)    is   not  the  only   restriction  which   could  be  used  to 

prove  Theorem  U.l,  but   it  is   one  which  the  Newton-Richardson  methods   satisfy 

trivially  since  B(x)    is   a  constant   function.      From  assumptions    (iii)    and 

(vii),   it  follows  that   C(x)    also   satisfies    a  Lipschitz  condition.      Thus, 

since  we  have   already  noted  that  B  is   continuous    and  nonsingular  and  B 

is   uniformly  bounded  in  S7,    it    follows   that   there   are   an  L]    <  +  °°   and  a 

0   <   r0   <:  min(r,_,   r„)    such  that   for  x  e  SQ  =    {x:  x  -  x  <  rD} 

o  5        7  -.(j-ii~-i"  H 

| |H(x)    -  H(x    ) | |    £  L    | |x  -  x    | |. 
Using  this   result,  we  now  prove   Theorem  k.l. 

Proof  of  Theorem  k.l:      Let   8,    c    ,    c    ,    and  r     be   defined  as   in  Lemma  A. 2 
and  define 

c(t)   =    ((cx  +   c2  L]_)t  +   X1)1/m. 

We   select   r  <   rft  so   that   c(r)    <c=    (X+l)/2<l   and   choose  x  e   S.      We 

can  now  prove   the  theorem  by  induction  using  Lemma  A. 2.      For  k  =   0  we  have 
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x(l)-x*||=    ||Gm(X(°h-x*| 

-  mo  ~ 


*   Cl||x(0)    -x*||2+c2||H(x(0))    -H(x*)||     ||x(0)    -x*||    + 

Xm°||x(0^/|| 
i    ((cx  +   c2Ll)r  +   A1"0)    ||x(0)    -   5*|  | 

w  m     i     >%  ii  (°)      *n 

<(c       -    X    +    X        )  X  -    X    II. 


m. 
„       „       ,    0       ,  m 

But    0<X         <X<c,so 


and  we  have 


m        ,        ,    0  m  0 

0<c     -X+X        £c      £   c      , 


(D        *|i   <    mo  I,   (o)        *, 

x  -   x    |       £   c  [x  -   x    I 


If  we    assume   that  the  theorem  holds    for  0   £  k   <   j,  then   for  k  =   j 
we  have 
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l;(J+1)-j*ll-l|on(x(J))-=c*! 

J 


*  Cl||x(j)  -  x*||2  +  c2||H(x(j))  -  H(x*)  |  |    ||x(j)   -  x*| 


m. 


*WJ>-X* 


t  (o1*e2L1)    ||x(J)  -x*||2+  »J    ||x<J)  -  x*| 

S  I(c1+  =2LX)(  V  c"*  )    ||x(0)  -5*1|  +  X^]    ||x(J) 


*    [(C]L  +   c2  Lx)r  c  J        +   X   J]    ||xUJ   -  x 


*   [(cm  -   X)c  J        +  X  J]    ||x(j)   -  x 


m .  m .  — m         m .  /  .  \  ^ 

=    [c   J   _  xc  J        +   X   J]    ||xUJ    -  x    I 


m .  m .      -m 


=   c 


I  .  -Ul  .  —Ill  .  /    .   \ 

J[l-   Xc"m+    X   J    c      J]    ||x(j)    -   x 


* 


m.  m.— 1     m— m.  /  .\  . 

=   c   J[l  +   Xc"m(x  J        c        J    -  1)]    ||x(j)    -  x' 


But  0    <   X   <   cm   <  1,    so    Xc~m   <  1   and 


Hence 


and  we  have 


m.-l     m-m.  m(m.-l)  m-m.  (m-l)m. 

0<XJ        c        J    <  c        J        c        J   =    c  J<1. 


m.-l     m-m. 
-1    <   Ac-m(X  J        c        J   -   1)    <  0, 


i   (1+1)        *i  i        mi  i  i    (J)        *i  i 
xVJ  -  x  £  c  J      xVJ/   -  x        . 
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